54 research outputs found
A Style-Based Generator Architecture for Generative Adversarial Networks
We propose an alternative generator architecture for generative adversarial
networks, borrowing from style transfer literature. The new architecture leads
to an automatically learned, unsupervised separation of high-level attributes
(e.g., pose and identity when trained on human faces) and stochastic variation
in the generated images (e.g., freckles, hair), and it enables intuitive,
scale-specific control of the synthesis. The new generator improves the
state-of-the-art in terms of traditional distribution quality metrics, leads to
demonstrably better interpolation properties, and also better disentangles the
latent factors of variation. To quantify interpolation quality and
disentanglement, we propose two new, automated methods that are applicable to
any generator architecture. Finally, we introduce a new, highly varied and
high-quality dataset of human faces.Comment: CVPR 2019 final versio
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Text-to-image synthesis has recently seen significant progress thanks to
large pretrained language models, large-scale training data, and the
introduction of scalable model families such as diffusion and autoregressive
models. However, the best-performing models require iterative evaluation to
generate a single sample. In contrast, generative adversarial networks (GANs)
only need a single forward pass. They are thus much faster, but they currently
remain far behind the state-of-the-art in large-scale text-to-image synthesis.
This paper aims to identify the necessary steps to regain competitiveness. Our
proposed model, StyleGAN-T, addresses the specific requirements of large-scale
text-to-image synthesis, such as large capacity, stable training on diverse
datasets, strong text alignment, and controllable variation vs. text alignment
tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms
distilled diffusion models - the previous state-of-the-art in fast
text-to-image synthesis - in terms of sample quality and speed.Comment: Project page: https://sites.google.com/view/stylegan-t
COEGAN: Evaluating the Coevolution Effect in Generative Adversarial Networks
Generative adversarial networks (GAN) present state-of-the-art results in the
generation of samples following the distribution of the input dataset. However,
GANs are difficult to train, and several aspects of the model should be
previously designed by hand. Neuroevolution is a well-known technique used to
provide the automatic design of network architectures which was recently
expanded to deep neural networks. COEGAN is a model that uses neuroevolution
and coevolution in the GAN training algorithm to provide a more stable training
method and the automatic design of neural network architectures. COEGAN makes
use of the adversarial aspect of the GAN components to implement coevolutionary
strategies in the training algorithm. Our proposal was evaluated in the
Fashion-MNIST and MNIST dataset. We compare our results with a baseline based
on DCGAN and also with results from a random search algorithm. We show that our
method is able to discover efficient architectures in the Fashion-MNIST and
MNIST datasets. The results also suggest that COEGAN can be used as a training
algorithm for GANs to avoid common issues, such as the mode collapse problem.Comment: Published in GECCO 2019. arXiv admin note: text overlap with
arXiv:1912.0617
Generative Novel View Synthesis with 3D-Aware Diffusion Models
We present a diffusion-based model for 3D-aware generative novel view
synthesis from as few as a single input image. Our model samples from the
distribution of possible renderings consistent with the input and, even in the
presence of ambiguity, is capable of rendering diverse and plausible novel
views. To achieve this, our method makes use of existing 2D diffusion backbones
but, crucially, incorporates geometry priors in the form of a 3D feature
volume. This latent feature field captures the distribution over possible scene
representations and improves our method's ability to generate view-consistent
novel renderings. In addition to generating novel views, our method has the
ability to autoregressively synthesize 3D-consistent sequences. We demonstrate
state-of-the-art results on synthetic renderings and room-scale scenes; we also
show compelling results for challenging, real-world objects.Comment: Project page: https://nvlabs.github.io/genv
- …